Structural Feature Selection For English-Korean Statistical Machine Translation
نویسندگان
چکیده
When aligning texts in very di erent languages such as Korean and English, structural features beyond word or phrase give useful information. In this paper, we present a method for selecting structural features of two languages, from which we construct a model that assigns the conditional probabilities to corresponding tag sequences in bilingual EnglishKorean corpora. For tag sequence mapping between two langauges, we rst de ne a structural feature function which represents statistical properties of empirical distribution of a set of training samples. The system, based on maximum entropy concept, selects only features that produce high increases in loglikelihood of training samples. These structurally mapped features are more informative knowledge for statistical machine translation between English and Korean. Also, the information can help to reduce the parameter space of statistical alignment by eliminating syntactically unlikely alignments.
منابع مشابه
Applying Statistical Post-Editing to English-to-Korean Rule-based Machine Translation System
Conventional rule-based machine translation system suffers from its weakness of fluency in the view of target language generation. In particular, when translating English spoken language to Korean, the fluency of translation result is as important as adequacy in the aspect of readability and understanding. This problem is more severe in language pairs such as English-Korean. It’s because Englis...
متن کاملKorean Language Resources for Everyone
This paper presents open language resources for Korean. It includes several language processing models and systems including morphological analysis, part-of-speech tagging, syntactic parsing for Korean, and standard evaluation Korean-English machine translation data with the Korean-English statistical machine translation baseline system.Wemake them publicly available to pave the way for further...
متن کاملKorean to English Translation Using Synchronous TAGs
It is often argued that accurate machine translation requires reference to contextual knowledge for the correct treatment of linguistic phenomena such as dropped arguments and accurate lexical selection. One of the historical arguments in favor of the interlingua approach has been that, since it revolves around a deep semantic representation, it is better able to handle the types of linguistic ...
متن کاملKorean Adverb Ordering in English-Korean Machine Translation Using Clustering
This paper proposes an approach to determine the ordering of Korean adverb by using clustering method for making sentences more natural at the generation stage of English-Korean machine translation system. After observing the feature information of Korean adverb classified by scholars of Korean literature, we analyze an adverb ordering about the feature information. Afterwards, we extract conse...
متن کاملThe BM-I2R Haitian-Créole-to-English translation system description for the WMT 2011 evaluation campaign
This work describes the Haitian-Créole to English statistical machine translation system built by Barcelona Media Innovation Center (BM) and Institute for Infocomm Research (I2R) for the 6th Workshop on Statistical Machine Translation (WMT 2011). Our system carefully processes the available data and uses it in a standard phrase-based system enhanced with a source context semantic feature that h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000